Finding One's Best Crowd: Online Learning By Exploiting Source Similarity

نویسندگان

Yang Liu

Mingyan Liu

چکیده

We consider an online learning problem (classification or prediction) involving disparate sources of sequentially arriving data, whereby a user over time learns the best set of data sources to use in constructing the classifier by exploiting their similarity. We first show that, when (1) the similarity information among data sources is known, and (2) data from different sources can be acquired without cost, then a judicious selection of data from different sources can effectively enlarge the training sample size compared to using a single data source, thereby improving the rate and performance of learning; this is achieved by bounding the classification error of the resulting classifier. We then relax assumption (1) and characterize the loss in learning performance when the similarity information must also be acquired through repeated sampling. We further relax both (1) and (2) and present a cost-efficient algorithm that identifies a best crowd from a potentially large set of data sources in terms of both classifier performance and data acquisition cost. This problem has various applications, including online prediction systems with time series data of various forms, such as financial markets, advertisement and network measurement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Rank Scientific Documents from the Crowd

Motivation: Finding related published articles is an important task in any science, but with the explosion of new work in the biomedical domain it has become especially challenging. Most existing methodologies use text similarity metrics to identify whether two articles are related or not. However biomedical knowledge discovery is hypothesis-driven. The most related articles may not be ones wit...

متن کامل

Genre Ontology Learning: Comparing Curated with Crowd-Sourced Ontologies

The Semantic Web has made it possible to automatically find meaningful connections between musical pieces which can be used to infer their degree of similarity. Similarity in turn, can be used by recommender systems driving music discovery or playlist generation. One useful facet of knowledge for this purpose are fine-grained genres and their inter-relationships. In this paper we present a meth...

متن کامل

A standard Interactive Multimedia eBook Generator Engine for e-Learning Process

Introduction: Using standard authoring tools is essential to promote E-Learning in teaching-learning process. Learning content in medical sciences often consists of multimedia elements. On the other hand, it is frequently required to revise and update the medical content. Hence, access to the authoring tools that can encompass multimedia elements and allow easy content revision is helpful in e-...

متن کامل

Exploiting Online Discussions in Collaborative Distributed Requirements Engineering

Large, distributed software development projects, like Open Source Software (OSS), adopt different collaborative working tools, including online forums and mailing list discussions that are valuable source of knowledge for requirements engineering tasks in software evolution, such as model revision and evolution. In our research, we aim at providing tool support for retrieving information from ...

متن کامل

Utilizing Online Social Network and Location-Based Data to Recommend Products and Categories in Online Marketplaces

Recent research has unveiled the importance of online social networks for improving the quality of recommender systems and encouraged the research community to investigate better ways of exploiting the social information for recommendations. To contribute to this sparse field of research, in this paper we exploit users’ interactions along three data sources (marketplace, social network and loca...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Finding One's Best Crowd: Online Learning By Exploiting Source Similarity

نویسندگان

چکیده

منابع مشابه

Learning to Rank Scientific Documents from the Crowd

Genre Ontology Learning: Comparing Curated with Crowd-Sourced Ontologies

A standard Interactive Multimedia eBook Generator Engine for e-Learning Process

Exploiting Online Discussions in Collaborative Distributed Requirements Engineering

Utilizing Online Social Network and Location-Based Data to Recommend Products and Categories in Online Marketplaces

عنوان ژورنال:

اشتراک گذاری